AITopics | Dunn County

For safety and robustness of AI systems, we introduce topological parallax as a theoretical and computational tool that compares a trained model to a reference dataset to determine whether they have similar multiscale geometric structure. Our proofs and examples show that this geometric similarity between dataset and model is essential to trustworthy interpolation and perturbation, and we conjecture that this new concept will add value to the current debate regarding the unclear relationship between "overfitting" and "generalization" in applications of deeplearning. In typical DNN applications, an explicit geometric description of the model is impossible, but parallax can estimate topological features (components, cycles, voids, etc.) in the model by examining the effect on the Rips complex of geodesic distortions using the reference dataset. Thus, parallax indicates whether the model shares similar multiscale geometric features with the dataset. Parallax presents theoretically via topological data analysis [TDA] as a bi-filtered persistence module, and the key properties of this module are stable under perturbation of the reference dataset.

geometry, parallax, persistence diagram, (15 more...)

arXiv.org Machine Learning

2306.11835

Country:

North America > United States > North Carolina > Durham County > Durham (0.05)
North America > United States > Utah (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(5 more...)

Genre: Research Report (0.41)

Industry: Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking

Caramancion, Kevin Matthe

arXiv.org Artificial IntelligenceJun-18-2023

This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.

caramancion, information, llm, (17 more...)

arXiv.org Artificial Intelligence

2306.17176

Country:

North America > United States > Wisconsin > Dunn County > Menomonie (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.69)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

To Test Machine Comprehension, Start by Defining Comprehension

Dunietz, Jesse, Burnham, Gregory, Bharadwaj, Akash, Rambow, Owen, Chu-Carroll, Jennifer, Ferrucci, David

arXiv.org Artificial IntelligenceMay-11-2020

Many tasks aim to measure machine reading comprehension (MRC), often focusing on question types presumed to be difficult. Rarely, however, do task designers start by considering what systems should in fact comprehend. In this paper we make two key contributions. First, we argue that existing approaches do not adequately define comprehension; they are too unsystematic about what content is tested. Second, we present a detailed definition of comprehension -- a "Template of Understanding" -- for a widely useful class of texts, namely short narratives. We then conduct an experiment that strongly suggests existing systems are not up to the task of narrative understanding as we define it.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2005.01525

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(19 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.89)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback